95 research outputs found

    Negation and lexical morphology across languages: Insights from a trilingual translation corpus

    Get PDF
    This paper proposes an exploratory cross-linguistic bird's eye-view of negative lexical morphology by examining English, French and Italian negative derivational affixes. More specifically, it aims to uncover the French and Italian equivalents of the English affixes de, dis, in, non, un and less. These include morphological equivalents (i.e. negative prefixes in French and Italian) as well as non-morphological equivalents (i.e. single words devoid of negative affixation, multi-word units or paraphrases). The study relies on a nine-million-word trilingual translation corpus made up of texts from the Europarl corpus and shows that the systematic analysis of translation data makes it possible to identify the major morphological dissimilarities between the three languages investigated. The frequent use of non-morphological translations in French and Italian reflects fundamental differences between the source language (English) and the two target lan-guages (French and Italian), hence pointing to possible translation difficulties. Morphological translations, on the other hand, bring to light cross-linguistic similarities in the use of negative affixe

    A Task-based Evaluation of French Morphological Resources and Tools

    Get PDF
    Morphology is a key component for many Language Technology applications. However, morphological relations, especially those relying on the derivation and compounding processes, are often addressed in a superficial manner. In this article, we focus on assessing the relevance of deep and motivated morphological knowledge in Natural Language Processing applications. We first describe an annotation experiment whose goal is to evaluate the role of morphology for one task, namely Question Answering (QA). We then highlight the kind of linguistic knowledge that is necessary for this particular task and propose a qualitative analysis of morphological phenomena in order to identify the morphological processes that are most relevant. Based on this study, we perform an intrinsic evaluation of existing tools and resources for French morphology, in order to quantify their coverage. Our conclusions provide helpful insights for using and building appropriate morphological resources and tools that could have a significant impact on the application performance

    Annotating the meaning of discourse connectives by looking at their translation: The translation-spotting technique

    Get PDF
    The various meanings of discourse connectives like while and however are difficult to identify and annotate, even for trained human annotators. This problem is all the more important that connectives are salient textual markers of cohesion and need to be correctly interpreted for many NLP applications. In this paper, we suggest an alternative route to reach a reliable annotation of connectives, by making use of the information provided by their translation in large parallel corpora. This method thus replaces the difficult explicit reasoning involved in traditional sense annotation by an empirical clustering of the senses emerging from the translations. We argue that this method has the advantage of providing more reliable reference data than traditional sense annotation. In addition, its simplicity allows for the rapid constitution of large annotated datasets

    Word-formation in original and translated English: source language influence on the use of un- and less

    Get PDF
    This article aims to assess whether the word-formation features of translated language, as opposed to original language, are source language (SL)-dependent or translation-related. To do so, we analyze the use of the -less and un- negative affixes in original English and in English translated from four SL: French, Italian, Dutch and German. Findings based on the Europarl corpus show that the use of -less and un- in translated English is partially SL-dependent

    Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies

    Get PDF
    Translation studies rely more and more on corpus data to examine specificities of translated texts, that can be translated from different original languages and compared to original texts. In parallel, more and more multilingual corpora are becoming available for various natural language processing tasks. This paper questions the use of these multilingual corpora in translation studies and shows the methodological steps needed in order to obtain more reliably comparable sub-corpora that consist of original and directly translated text only. Various experiments are presented that show the advantage of directional sub-corpora

    Annotating the meaning of discourse connectives by looking at their translation: The translation-spotting technique

    Get PDF
    The various meanings of discourse connectives like while and however are difficult to identify and annotate, even for trained human annotators. This problem is all the more important that connectives are salient textual markers of cohesion and need to be correctly interpreted for many NLP applications. In this paper, we suggest an alternative route to reach a reliable annotation of connectives, by making use of the information provided by their translation in large parallel corpora. This method thus replaces the difficult explicit reasoning involved in traditional sense annotation by an empirical clustering of the senses emerging from the translations. We argue that this method has the advantage of providing more reliable reference data than traditional sense annotation. In addition, its simplicity allows for the rapid constitution of large annotated datasets

    Machine Translation Evaluation beyond the Sentence Level

    Get PDF
    Automatic machine translation evaluation was crucial for the rapid development of machine translation systems over the last two decades. So far, most attention has been paid to the evaluation metrics that work with text on the sentence level and so did the translation systems. Across-sentence translation quality depends on discourse phenomena that may not manifest at all when staying within sentence boundaries (e.g. coreference, discourse connectives, verb tense sequence etc.). To tackle this, we propose several document-level MT evaluation metrics: generalizations of sentence-level metrics, language-(pair)-independent versions of lexical cohesion scores and coreference and morphology preservation in the target texts. We measure their agreement with human judgment on a newly created dataset of pairwise paragraph comparisons for four language pairs

    A Corpus-based Contrastive Analysis for Defining Minimal Semantics of Inter-sentential Dependencies for Machine Translation

    Get PDF
    Inter-sentential dependencies such as discourse connectives or pronouns have an impact on the translation of these items. These dependencies have classically been analyzed within complex theoretical frameworks, often monolingual ones, and the resulting fine-grained descriptions, although relevant to translation, are likely beyond reach of statistical machine translation systems. Instead, we propose an approach to search for a minimal, feature-based characterization of translation divergencies due to inter-sentential dependencies, in the case of discourse connectives and pronouns, based on contrastive analyses performed on the Europarl corpus. In addition, we show how to automatically assign labels to connectives and pronouns, and how to use them for statistical machine translation
    corecore